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Abstract: In this mini-review we describe the different strategies for rational protein engineering and summarize the computational 
tools available. Computational tools can either be used to design focused libraries, to predict sequence-function relationships or for 
structure-based molecular modelling. This also includes de novo design of enzymes. Examples for protein engineering of aldolases 
and trans aldolases are given in the second part of the mini-review. 
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!♦ Introduction 



Asymmetric aldol additions are a corner stone of preparative 
organic chemistry. Concomitant with the formation of a C-C bond 
between a nucleophile (donor) and an electrophile (acceptor) one or 
two new stereocenters are created. This type of reaction can also be 
carried out by enzymes, such as aldolases and trans aldolases. Those 
enzymes, in most cases, strictly control the stereo configuration at the 
newly formed stereo center(s). Aldolases are applied in biocatalysis for 
the synthesis of amino acid and carbohydrate derivatives. For more 
details about aldolases and their biocatalytic application see recent 
reviews [1-4]. 

Mechanistically, class I and class II aldolases are distinguished. 
Class I aldolases form a Schiff base intermediate between a conserved 
Lys in the active site and the carbonyl carbon atom of the donor 
substrate, i.e. usually a ketone. By proton abstraction an enamine 
intermediate is formed which attacks the carbonyl carbon atom of the 
acceptor aldehyde. Class I aldolases do not require any cofactor and 
they exhibit a typical ((3/ a)s-barrel fold. Class II aldolases depend on 
a divalent cation which acts as a Lewis acid. The metal ion helps to 
deprotonate the donor substrate and stabilises the enolate formed. 
Therefore, these aldolases can be inhibited by EDTA. According to 
their structure and sequence class I and class II aldolases do not show 
any significant homology. Apparently, they evolved separately. 

Aldolases usually accept a wide range of acceptor substrates which 
allows a broad range of synthetic applications. On the other hand, 
they are in general very specific for their donor substrate. Hence, they 
are classified as (i) dihydroxyacetone phosphate (DHAP) dependent 
aldolases, (ii) dihydroxyacetone (DHA) dependent aldolases, (iii) 
pyruvate/2-oxobutyrate dependent aldolases, (iv) acetaldehyde 
dependent aldolases and (v) glycine/alanine dependent aldolases [I]. 
Glycine/ alanine dependent aldolases are neither class I nor class II 
aldolase but require pyridoxal phosphate (PLP) as cofactor. 
Structurally, they belong to the fold type I family of PLP dependent 
enzymes. 
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Transaldolases (Tal) transfer a DHA moiety from a ketose donor 
to an aldehyde acceptor. A new C-C bond is formed with 3S,4R 
stereo configuration. Mechanistically (Schiff base intermediate) and 
structurally ((P/ (x)s-barrel fold), Tals show similarity to class I 
aldolases. However, compared to DHAP dependent class I aldolases 
the conserved Lys residue moved to a different p-strand suggesting a 
circular permutation of the protein sequence [5]. Tals are almost 
ubiquitous enzymes and according to their sequence similarity they 
were divided into five subfamilies. The wild type enzyme did not find 
much application in biocatalysis. For more details on the Tal enzyme 
family see recent publications [6, 7]. 

Using computational tools protein engineering within this enzyme 
family was directed towards the following aims: (i) the discovery of 
new enzymes, (ii) the differentiation between enzyme families or 
subfamilies, (iii) the engineering of enzymes for new applications and 
(iv) the design of novel aldolases. In this mini-review we will first 
describe the different strategies for protein engineering and summarize 
the computational tools available. In the second part, we will give 
examples from the enzyme family of aldolases and transaldolases. 

2* Computational tools for protein engineering 

Isolated enzymes have been successfully applied for 
bioconversions provided the enzyme is stable, soluble, and easy to 
produce. However, in most cases the commercially available enzymes 
are not optimal for the desired chemical process. Therefore, in silico, 
in vitro, and in vivo strategies have been developed to screen for 
appropriate enzymes from the natural pool [8]. However, natural 
enzymes rarely have the combined properties necessary for industrial 
chemical production such as high activity, high selectivity, broad 
substrate specificity towards non-natural substrates, no inhibition by 
substrate or product, and a high stability in organic solvents and at 
high substrate or product concentrations [9]. Therefore, protein 
engineering has been successfully applied to design enzymes with new 
substrate spectra and new functions as catalysts for unnatural 
substrates, and to fine-tune bottleneck enzymes in metabolic 
engineering [10]. Three major computational strategies are currently 
applied to support protein engineering: directed evolution, methods to 
predict sequence-function relationships, and structure-based molecular 
modelling methods. 
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2. 1 Design of focused libraries for directed evolution 

Directed evolution has proven to be an effective method to 
improve the properties of enzymes (for aldolases see review [II]). 
The unguided use of random mutagenesis methods, however, results 
in protein libraries with millions of members which still only sample a 
small fraction of the vast sequence space possible [12]. Recently, 
several computational approaches have been suggested to improve the 
efficiency of the directed evolution by enriching the library and 
reducing the library size substantially, taking into account further 
information. An enrichment of the library may be achieved by 
considering structure information on residues that are involved in 
substrate binding. This approach has guided the design of highly 
focused libraries and resulted in mutants with increased selectivity 
[13-15] or shifted substrate specificity [16-19]. The size of the 
library can be reduced by limiting the possible amino acid alphabet, 
i.e. not all 20 amino acids but a subset is used instead, depending on 
the desired interactions [20]. To estimate the screening effort 
necessary the CASTER tool was developed by the Reetz group. A 
comprehensive statistical analysis of a large number of favourable and 
less favourable mutants identified hot spot regions that are beneficial 
to enzyme activity and stability [21-23]. Most of these methods to 
search for promising mutation sites require expert knowledge in 
bioinformatics which may not be present in experimentally oriented 
research groups. Therefore, online tools that require little to none 
bioinformatics knowledge have become popular. Meta-tools such as 
the HotSpot Wizard [24] offer a complete workflow to assess 
promising mutation sites by combining a variety of methods such as 
Catalytic Site Atlas [25], CASTp [26], CAVER [27], BLAST [28], 
MUSCLE [29], as well as sequence and structure databases such as 
UniProt [30], NCBI GenBank [31], and PDB [32]. 

2.2 Prediction of sequence function relationships 

The second strategy takes advantage of the rapidly growing 
amount of available protein sequences, structures, functional and 
biochemical data. Systematic analyses are based on large number of 
protein sequences and complete protein families to yield insights into 
catalytic mechanisms and evolutionary pathways [33]. By comparing 
the sequences of homologous proteins, consensus or ancestor 
sequences were constructed. Back-to-the-consensus mutations were 
shown to increase stability [34-36] or improve expression [37]. 
Recently, ancestral mutations have been integrated with directed 
evolution to generate a stabilized starting point of highly diverse and 
evolvable gene libraries [38]. Alternatively, multi-sequence alignments 
were analyzed to identify correlated mutations, to identify structurally 
or functionally relevant residues [39, 40], and to predict mutants with 
improved substrate specificity, catalytic activity, or protein stability 
[41]. Sequence-based methods were also applied to predict 
aggregation-prone regions [42] and to design mutants with decreased 
aggregation rates [43]. Multiple sequence alignments assisted by 
structural information were also used to identify subfamily specific 
positions in aldolases [44-46]. 

While the amount of information on sequence, structure, and 
biochemical information is steadily increasing, it is generally not 
available to a systematic analysis. Therefore, databases have been 
developed that provide access to enzymatic information such as 
BRENDA [47] or to integrate information on enzyme families such 
as DWARF [48] and 3 DM [49]. BRENDA (BRaunschweig 
ENzyme DAtabase) offers a comprehensive collection of biochemical 
data on a broad range of enzyme families, which are grouped 
according to their EC numbers, providing information about reaction 
type, products, and substrates, organisms of origin, and an overview of 
available publications. The DWARF system (Data Warehouse system 



for Analyzing pRotein Families) integrates sequence, structure, and 
annotation information of large protein families including lipases 
[50], triterpene cyclases [51], thiamine-diphosphate dependent 
enzymes [52], and lactamases [53]. The 3DM system [54] is based 
on the creation of structure-based multiple sequence alignments. A 
common numbering scheme for structurally equivalent amino acids 
allows for the automated creation of homology models, the analysis of 
correlated or conserved residues and the prediction of functionally 
relevant residues [41, 55]. As of the time of this review, no database 
with a focus on aldolases has been published. 

23 Structure-based molecular modelling 

The third strategy starts from information on protein structure 
and seeks to improve stability, activity, specificity, or selectivity by 
molecular modelling. While for a growing number of proteins, 
experimentally determined structure information become available by 
the Protein Data Bank [32], only for a small fraction of all proteins 
with known sequence the structure is also known. However, if 
sequence similarity is sufficiently high the structure of a protein can 
be modeled based on a sequence comparison to a protein with 
experimentally determined structure. Sequence identities as low as 
25% are usually enough to predict reliable structure models, in some 
cases even sequences with lower sequence identities are suitable for 
homology modeling [56]. Homology modeling programs such as 
Swiss-Model [57], Modeller [58] or Rosetta [59] are based on the 
observation that during evolution structure has been more conserved 
than sequence. Thus, proteins with similar sequence have a similar 
structure. Using these methods, structure models can be derived for 
the majority of soluble proteins as demonstrated by the biannual 
Critical Assessment of Protein Structure Prediction [60]. 

Many strategies for protein stabilization have been proposed: 
optimization of the distribution of surface charge— charge interactions 
[61, 62], improvement of core packing [63] and of the protein 
surface [64], and rigidification by introduction of prolines, exchange 
of glycines, introduction of disulfide bridges [65] or mutagenesis at 
positions with high B-factor [66]. However, it is still challenging to 
reliably predict mutations that stabilize the enzyme without affecting 
its activity or selectivity, which are a direct consequence of the 
molecular recognition of the substrate by the enzyme. For a change in 
stereoselectivity the side chains in vicinity of the stereocentre can be 
determined from structural data. These residues can then be split into 
sectors containing two to three residues which are randomized 
simultaneously [67, 68]. To improve activity and selectivity, 
modelling of the enzyme-substrate complex by molecular docking 
methods has been used to study the molecular basis of specificity and 
selectivity, and to predict mutations in the enzyme or modifications of 
the substrate structure that mediate specificity or selectivity [69-71]. 
It is recognized that shape and physico-chemical properties of the 
active site and the substrate binding site are the major driving forces 
to provide the specific interactions between enzyme and the transition 
state of the substrate that lead to catalysis. Moreover, there is 
increasing evidence that flexibility of the enzyme-substrate complex is 
crucial to recognition, because minor structural adjustments can have 
a big impact on the docking score [51]. Docking has been extensively 
used to predict substrate specificity and to identify positions that 
mediate substrate binding. Amino acids that clash with the desired 
substrate upon docking were exchanged, leading to an increase of 
catalytic activity of the enzyme variant toward this substrate [72-74]. 
Catalytic activity is mediated by only a small number of amino acids, 
metals, or cofactors located in the vicinity of the active site. However, 
substrate specificity and selectivity of an enzyme might be determined 
by factors beyond the geometric shape of the active site, such as long- 
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Figure 1. Protein engineering of TalB. First, TalB was engineered to use DHA as donor in an aldolase reaction. In a second step, the affinity towards non- 
phosphorylated acceptors was improved. 



range effects of mutations [75, 76] or the effect of a substrate access 
tunnel [77, 78]. Methods of simulating protein structure and 
dynamics have been successfully applied to investigate the molecular 
basis of thermostability [79], temperature optimum [80], or 
specificity and selectivity [81-86]. Simulations are already successfully 
applied as powerful tools to interpret experimental results in 
retrospect, and we are only at the beginning of applying these 
methods for a predictive, rational design of enzymes. 

Though fine-tuning enzymes by point mutations has been 
successfully applied, the resulting enzymes are still limited to a small 
range of reactions catalysed by natural enzymes. Therefore, a major 
challenge of enzyme design is to go beyond the range of natural 
reactions and to design enzymes with new catalytic functions. The 
strategy of transplanting a chemical activity takes advantage of enzyme 
promiscuity [87, 88] and has been successfully applied to engineering 
a lipase into an aldolase [89]. Beyond this, first successful steps have 
been made towards de novo design of enzymes that have a new 
catalytic function: a retro-aldol enzyme was designed which showed a 
rate acceleration of the catalysed versus the uncatalysed reaction by 
I0 4 [90]. However, while the de novo designed enzymes are 
functional, their catalytic efficiency is still many orders of magnitude 
below the efficiency of natural enzymes [91]. The efficiency of de 
novo designed enzymes can be increased by directed evolution (see 
section 3.4 ) [85]. 

3* Selected examples for engineering of aldolases and Tals 

We will demonstrate on a few examples how computational tools 
were used successfully for engineering of aldolases and transaldolases. 



This includes engineering approaches using localised randomisation, 
i.e. saturation mutagenesis, at positions which have been predicted to 
be important for the engineered property based on structural 
information or models. Unguided use of random mutagenesis, e.g. 
error prone PCR (epPCR) or DNA shuffling, is beyond the scope of 
this review. For more examples, also of evolutionary approaches, see 
recent reviews [2, II]. 

3. 1 Alteration of substrate specificity 

The high affinity or strict specificity of aldolases towards 
phosphorylated substrates is a major limitation in their biocatalytic 
application. Phosphorylated substrates are often instable and 
expensive and the phosphoryl group introduced in the product needs 
to be removed from the final product. Therefore, aldolases with 
higher affinity for non-phosphorylated acceptor and donor substrates 
are highly desired. The binding site of the phosphoryl group of the 
acceptor substrate in a recently engineered DHA dependent aldolase 
(TalB FI78Y)[46] was identified due to a sulphate ion which was 
bound in the active site in the crystal structure [46]. The coordinating 
positions (2x Arg, Ix Ser) were targeted by saturation mutagenesis. 
The generated mutant libraries were screened using a newly developed 
colour assay for variants exhibiting a higher affinity for the non- 
phosphorylated acceptor D-glyceraldehyde [18]. Positive clones were 
identified in the library at position TalB FI78Y/RI8IX. The best 
results were achieved for the TalB FI78Y/RI8IE variant with an at 
least 2-fold improvement in affinity for D-glyceraldehyde (Fig. I). 
This confirmed the importance of RI8I for binding of the 
phosphoryl group. 
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Figure 2: Alteration of substrate specificity. Protein engineering towards an increased affinity for non-phosphorylated acceptor and donor substrates in KDPG 
aldolase (A) and RhuA (B), respectively. C, donor binding sites of FSA and TalB. 



Using a similar approach, the affinity of the pyruvate dependent 
2-keto-3-deoxy-6-phosphogluconate (KDPG) aldolase was increased 
towards non-phosphorylated acceptor substrates [16]. The KDPG 
aldolase variant SI84L exhibits an increased catalytic efficiency (2.5 — 
6.5 -fold) for uncharged hydrophobic substrates without an alteration 
on its stereoselectivity (Fig. 2A). In a recent study, four residues in the 
aldehyde (acceptor) binding site (GI62, GI63, SI 84 and TI6I) were 
randomised by saturation mutagenesis and positive clones were 
selected using a pyruvate auxotrophic strain [17]. GI62, GI63 and 
SI 84 form the phosphoryl group binding site and TI6I bridges the 
pyruvate and the aldehyde binding site. Single substitutions (TI6IS, 
SI84F) lead to an improved catalytic efficiency (4 — 12-fold) for the 
hydrophobic substrates 2-keto-4-hydroxy-octonoate (KHO) and 
(4S)-2-keto-4-hydroxy-4-(2'-pyridyl)butyrate (S-KHPB) compared 
to wild type (wt). This improvement was even more pronounced 
upon a combination of the substitutions (TI6IS/SI84L; for S- 
KHPB of 450-fold). Interestingly, the double mutant retained its 
stereoselectivity compared to wt. The hydroxyl group of residue 
TI6I seems to be crucial for the wt stereoselectivity. Modelling of 



the C4-epimeric substrates KDPG and KDPGal as Schiff base 
intermediate into the structure of the E. coli enzyme, respectively, 
suggests that a hydrogen bond network and the correct positioning of 
a water molecule in the active site are important for a stereospecific 
proton transfer and hence, the stereoselectivity of the enzyme. 

In the DHAP dependent L-rhamnulose- 1 -phosphate aldolase 
(RhuA), the five residues (N29, N32, S75, TII5 and SI 16) forming 
the binding site of the phosphoryl group of the donor substrate were 
substituted by Asp to enable new polar contacts which might increase 
the affinity towards DHA [92]. The individual variants were 
characterised. The introduced mutation (N29D) had only a minor 
effect (2-fold increase) on the yield for aldol adduct formation with 
an non-natural acceptor which is due to a 3 -fold higher Vm ax of the 
N29D variant compared to wt. This increase in activity might be 
caused by a direct interaction of the introduced Asp side chain with 
the CI -OH group of the donor. In summary, an aldolase was 
engineered that can use the inexpensive donor DHA and exhibits a 
complementary stereoselectivity (3R,4S) to the DHA dependent 
aldolases known so far, FSA and TalB FI78Y (3^4^) (Fig. 2B). 
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Figure 3. Protein engineering of NANA. First, NANA was engineered to use the non-natural substrate dipropylamide as acceptor. In a second approach, two 
stereocomplementary variants were developed. 



Recently, the substrate scope of FSA and TalB FI78Y has been 
investigated with respect to the synthesis of deoxysugars [93]. This 
study revealed a complementary donor specificity of the two enzymes. 
FSA prefers hydrophobic donors, such as hydroxyacetone (HA) and 
I -hydroxy-2-butanone (HB), whereas TalB FI78Y strongly prefers 
DHA (Fig. 2C). This was rationalised by differences in sequence and 
polarity of the donor binding site. By a replacement of AI29S in 
FSA, which resembles the TalB active site, the catalytic efficiency 
towards DHA was greatly improved (17-fold) [94]. The reciprocal 
substitution in TalB FI78Y (TalB FI78Y/SI76A) resulted in an 
increased activity for HA [93]. 

A^acetylneuraminic acid aldolase (NANA) catalyses the aldol 
addition of pyruvate to TV-acetylmannos amine and accepts a wide 
range of C5- and C6-aldehydes as substrates. It is used for the 
synthesis of sialic acid derivatives. To extend the substrate scope of 
NANA of E. coli a. semirational approach was used [95, 96]. As there 
was no structure available for the B* coli enzyme in complex with a 
substrate analog the structure of a complex of a related enzyme (35% 
identity) was used to identify residues that interact with the acceptor 
substrate. At three positions (DI9I, EI 92 and S208) a saturation 
mutagenesis was performed and the generated libraries were screened 
in retro-aldol direction for pyruvate formation using a dipropylamide 
as model substrate (Fig. 3). The new aldolase (EI92N) shows a 49- 
fold increase in catalytic efficiency towards the screening substrate 
compared to wt [96] and an almost 6-fold higher catalytic efficiency 
towards the new substrate than NANAwt towards its natural 
substrate. NANA EI92N was successfully applied for the synthesis 
of sialic acid mimetics from substrates with differently substituted 



tertiary amides [95]. The products were obtained in a ~ 80:20 
mixture of the epimers. 

3.2 Stereoselectivity 

Concomitant with the C-C bond formation one or two new 
stereo centers are formed in an aldolase catalysed reaction. Most 
aldolases are strictly stereoselective but for some the stereoselectivity 
needs to be improved or the design of stereocomplementary enzymes 
is desired. For aldolases, this means that the stereochemical course of 
the reaction needs to be altered, i.e. the nucleophilic attack on the 
carbonyl carbon of the acceptor aldehyde takes place from the 
opposite side. Often the molecular determinants for stereoselectivity 
are not that well understood. Therefore, for developing a pair of 
stereocomplementary NANA variants [97] epPCR was applied in the 
first round to identify positions important for control of the 
stereoselectivity. As starting point the NANA EI92N variant was 
selected which exhibits poor stereoselectivity. In the next rounds, a 
structure-guided approach was used. Only three (A 10, T48, S208) of 
the residues identified by epPCR make direct contact with the 
substrate and were selected for separate saturation mutagenesis. 
Additionally, in a related aldolase (KDG aldolase) TI67 forms an H- 
bond to the epimeric C4-OH group of the substrate and was 
therefore included. It turned out that the side chain at this position is 
very crucial for stereoselectivity. By this approach an ^selective 
(EI92N/TI67G) and an ^-selective enzyme 

(EI92N/TI67V/S208V) was designed (Fig. 3). Both enzymes are 
about 50 times more selective (>98 : <2) than the parental enzyme. 
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from Sulfolobus solfataricus; B, engineering of an R-selective class II aldolase based on Bphl. 



The D-2-keto-3-deoxygluconate (KDG) aldolase from Sulfolobus 
solfataricus exhibits poor diastereo control and generates a 55:45 
mixture of D-KDGlu and D-KDGal using pyruvate and D- 
glyceraldehyde as substrate. To improve the stereoselectivity of the 
enzyme and to create a pair of stereo complementary variants X-ray 
structures of the aldolase with the diastereomeric products bound 
were employed (Fig. 4A)[I5]. Interestingly, the (R)-C4-OH and (S)- 
C4-OH groups form similar H-bonds (TI57, YI30) but the H- 
bond pattern for the C5-OH and C6-OH differs. A combination of 
saturation mutagenesis at TI57 and site-directed mutagenesis was 
used to generate variants specific for D-KDGlu (TI57C/YI32V dr 
91%, TI57F/YI32V dr 93%) and D-KDGal 
(TI57V/AI98L/DI8IQ dr 88%). This higher stereoselectivity had 
to be traded in by a lower affinity to the substrates (1.5 — 9 times 
higher Km) and a lower catalytic activity (60 — 100-fold drop). 

The class II aldolase Bphl of Burkholderia xenovornas is strictly 
stereoselective for the AS isomer as most stereoselective pyruvate 
dependent aldolases. The aim was to design an R-selective class II 
aldolase. As no structure of Bphl is available the structure of an 
ortholog (DmpG) was used and the substrate 4-hydroxy-2- 
oxopentanoate was modeled into the active site of DmpG [71]. 
According to the model, residues L87 and Y290 of Bphl should be in 
vicinity to C4. These residues were targeted by site-directed 
mutagenesis (Fig. 4B). The double mutants (L87N/Y290F and 
L87W/Y290F) were selective for the R -isomer but at the cost of 
lower activity and affinity (effect on k ca t/Km < 1 0-fold in aldol 
addition reactions compared to wt). 

3.3 Change in reaction type 

FSA, although an aldolase, belongs to the enzyme family of 
trans aldolases. Therefore, the question is what makes this enzyme an 
aldolase and not a transaldolase. A structure-guided sequence 
alignment of FSA and TalB was used to identify positions close to the 
active site that differ between those two enzymes [46]. These 



positions were targeted by saturation mutagenesis in TalB and the 
generated mutant libraries were screened for formation of fructoses- 
phosphate from DHA and glyceraldehyde-3 -phosphate. For the aldol 
addition reaction, the isolated variant TalBFI78Y shows a 70-fold 
improvement in activity compared to TalBwt and a similar catalytic 
efficiency as FSAwt (Fig. I). Hence, with just one amino acid 
replacement a switch in enzyme class was realised. The engineering of 
a DHA dependent aldolase (TalB FI78Y) on the TalB scaffold is a 
good example how a (semi)rational approach was used to change the 
reaction type and even more the enzyme class. 

By a single amino acid substitution (Y265A) the pyridoxal 
phosphate (PLP) dependent alanine racemase from Geobacillus 
stearothermophilus was converted into a D-threonine aldolase (Fig. 
5A)[98]. Both enzymes share a common reaction intermediate 
(aldimine between PLP and the respective substrate). The D- 
threonine aldolase uses a His to abstract a proton from the cofactor 
bound substrate and initiate the C-C bond cleavage step. Using 
structural comparison of the alanine racemase from Geobacillus 
stearothermophilus and the threonine aldolase from Thermotoga 
maritima a His (HI 66) on the opposite side of the cofactor was 
identified that does not interact with the substrate directly but forms 
an H-bond to Y265. It was proposed that a Y265A substitution 
would generate more space in the active site and put HI 66 in the 
right position to act as general base in an aldol reaction. The new 
aldolase shows a 2.3*I0 5 -fold increase in aldolase activity and a 
4*I0 3 -fold decrease in racemase activity with high stereoselectivity for 
the D- isomer. 

The promiscuous activity of Candida antarctica lipase B (CALB, 
EC 3. 1. 1. 3) for an aldol addition was enhanced by site-directed 
mutagenesis based on quantum chemical calculations [89]. The 
proposed mechanism differs from natural aldolases as the enolate 
intermediate is supposed to be stabilized by the oxyanion hole. By 
replacement of SerI05 of the catalytic triade by Ala the aldolase 
activity was increased 4-fold (Fig. 5B). However, the activity is much 
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Figure 5. Non-natural aldolases. Examples for aldolase reactions catalyzed by other enzymes: A, alanine racemase, B, lipase and C, tautomerase. 



lower than of natural aldolases. But the high stability of CALB, e.g. 
for organic solvents, could be of advantage for biocatalytical 
applications. 

The 4-oxalocrotonate tautomerase of Pseudomonas putida mt-2 
exhibits a promiscuous aldolase and dehydratase activity for the 
formation of ci nnamaldehyde from acetaldehyde and benzaldehyde 
(Fig. 5C)[99]. Here, the N-terminal Pro acts as nucleophile and 
forms an enamine intermediate with acetaldehyde. The catalytic 
activity was improved 16-fold (L8R [99]) and 600-fold (F50A 
kcat/Km 0.5 M" 1 s _I [I00]) by a single point mutation. 

3.4. De novo design 

A retro -aldolase for a non-natural substrate was developed by de 
novo design using the RosettaMatch algorithm [90]. De novo design 
of an aldolase is especially challenging as the reaction mechanism 
involves multiple steps of protonation and deprotonation and a 
network of long charged side chains and hydrogen bonds. As starting 
point, the catalytic mechanism involving a Lys and a Schiff base 
intermediate was chosen. Four different motifs were selected varying 
in their interactions to stabilize a composite transition state which is 
simultaneously compatible to multiple transition states and reaction 
intermediates. 42 of the 72 experimentally tested designs exhibit 
retro-aldolase activity in the screening reaction (Fig. 6). The active 
designs occur in five different protein scaffolds. The most active 
design shows a 2*I0 4 enhancement over the uncatalysed reaction 
(kcat/kuncat) but is far less active than natural enzymes. The kcat values 



for the active designs were around IO 3 min" 1 which is at the lower end 
of the range of kcat values for the catalytic antibody 38C2 (I0" 3 — 5 
min" 1 ) [101]. In contrast, natural aldolases exhibit kcat values around 
I0 3 min" 1 [102]. The most active designs include a water molecule in 
the active site which is coordinated by a Tyr residue and mediates the 
protonation and deprotonation steps. A similar scenario is found in 
native aldolases, e.g. FSA. However, a later study [103] revealed that 
the coordination of this water molecule by a Tyr residue in the active 
site does not contribute to the rate enhancement. A replacement of 
the Tyr by Phe resulted even in an increased activity. The lowering of 
the pKa value of the catalytic lysine residue by the surrounding 
hydrophobic pocket seems to have an effect on the rate enhancement 
and the largest contribution stems from the interaction of the 
substrate with its hydrophobic binding pocket [103]. 

Molecular dynamic simulations highlighted the importance to 
include protein dynamics and fluctuation as well as the orientation of 
the substrate in the active site in early stages of de novo design 
approaches [86]. Therefore, in a recent study [85] the design process 
was repeated for the motif comprising a Lys in a hydrophobic pocket 
and a water molecule but more care was taken for the rotamer 
sampling, the preorganization and positioning of side chains and 
packing of the active site. This resulted in a reproducible design of 
retro-aldolases with a very high success rate of 75%, i.e. 75% of the 
experimentally tested designs exhibited rates > 1 0-fold compared to 
the uncatalysed reaction in buffer. But still the designed retro- 
aldolases are not more active than the ones in the original study [90]. 



Volume No: 2, Issue: 3, September 2012, e20l2090l6 



Computational and Structural Biotechnology Journal | www.csbj.org 



Computational tools for protein engineering 



So the question retained how can the gap in activity be closed between 
designed and natural enzymes. Optimisation of the designed aldolases 
by several rounds of mutagenesis and screening resulted in < 100-fold 
increase in kcat/Km (12 M" 1 s" 1 ) [85, 104]. These investigations 
revealed as limiting factor: (i) low specificity and hence inhibition by 
products, (if) hydrophobic packing and positioning of substrate, (iii) 
hydrophobic packing and positioning of catalytic Lys which affects its 
reactivity. 

OH O denovo q 
JL Jl retro-aldolase ^. ^ JJ o 

jCCT^ — ~ vW + ^ 

Figure 6. Screening reaction used for de novo design of a retro-aldolase. 

The retro-aldol activity was monitored in 96-well plates by cleavage of a 
non-natural substrate. Upon cleavage, a fluorogenic naphthyl derivative is 
released. 

Outlook and conclusion 

The integration of sequence and structure information for the 
generation of focused libraries was widely applied for the protein 
engineering of aldolases. However, for some enzymatic properties 
such as the stereoselectivity the molecular determinants are still not 
well understood. The synthesis of enantiopure products is one big 
advantage of the application of enzymes compared to "classical" 
organic chemistry. But not all enzymes are strictly stereoselective, 
especially not with non-natural substrates, and not for all possible 
stereo configurations a corresponding enzyme exists (e.g. DHA 
dependent aldolases). Future protein engineering studies will try to 
generate new aldolases and give more insights on the molecular 
determinants for stereoselectivity. 

Systematic analysis of sequence was not much exploited for 
aldolases and transaldolases. We are currently setting up a database 
for the transaldolase family to get more information about subfamily 
specific residues, and the natural diversity of aldolases. This might 
allow us to discover new aldolases with interesting properties for 
biocatalytic applications. 

Although de novo design of retro -aldolases gave promising results 
the catalytic activity even of the optimised variants is still orders of 
magnitude lower than of natural aldolases. Therefore, the 
computational tools need to be improved to close the gap in activity 
between the designed and native enzymes. It is not clear whether 
protein engineering or evolution can close this gap or if we need a 
better design as starting point. Especially, reactions involving multiple 
steps such as the aldolase reaction are challenging. Here, each step 
needs to be considered and not only the rate-limiting step for the 
natural enzyme. Long charged side chains need to be positioned 
correctly and a water and H-bond network needs to be designed. 
Considering the molecular dynamics is important as proteins are not 
rigid scaffolds. Furthermore, the specificity of the designed aldolases 
needs to be improved as product inhibition was a problem. 
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